{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Advanced RAG with LlamaParse\n",
    "\n",
    "<a href=\"https://colab.research.google.com/github/run-llama/llama_parse/blob/main/examples/demo_advanced.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>\n",
    "\n",
    "This notebook is a complete walkthrough for using LlamaParse with advanced indexing/retrieval techniques in LlamaIndex over the Apple 10K Filing. \n",
    "\n",
    "This allows us to ask sophisticated questions that aren't possible with \"naive\" parsing/indexing techniques with existing models.\n",
    "\n",
    "Note for this example, we are using the `llama_index >=0.10.4` version"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!pip install llama-index\n",
    "!pip install llama-index-core==0.10.6.post1\n",
    "!pip install llama-index-embeddings-openai\n",
    "!pip install llama-index-postprocessor-flag-embedding-reranker\n",
    "!pip install git+https://github.com/FlagOpen/FlagEmbedding.git\n",
    "!pip install llama-parse"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!wget \"https://s2.q4cdn.com/470004039/files/doc_financials/2021/q4/_10-K-2021-(As-Filed).pdf\" -O apple_2021_10k.pdf"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Some OpenAI and LlamaParse details"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# llama-parse is async-first, running the async code in a notebook requires the use of nest_asyncio\n",
    "import nest_asyncio\n",
    "\n",
    "nest_asyncio.apply()\n",
    "\n",
    "import os\n",
    "\n",
    "# API access to llama-cloud\n",
    "os.environ[\"LLAMA_CLOUD_API_KEY\"] = \"llx-...\"\n",
    "\n",
    "# Using OpenAI API for embeddings/llms\n",
    "os.environ[\"OPENAI_API_KEY\"] = \"sk-...\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from llama_index.llms.openai import OpenAI\n",
    "from llama_index.embeddings.openai import OpenAIEmbedding\n",
    "from llama_index.core import VectorStoreIndex\n",
    "from llama_index.core import Settings\n",
    "\n",
    "embed_model = OpenAIEmbedding(model=\"text-embedding-3-small\")\n",
    "llm = OpenAI(model=\"gpt-3.5-turbo-0125\")\n",
    "\n",
    "Settings.llm = llm\n",
    "Settings.embed_model = embed_model"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Using brand new `LlamaParse` PDF reader for PDF Parsing\n",
    "\n",
    "we also compare two different retrieval/query engine strategies:\n",
    "1. Using raw Markdown text as nodes for building index and apply simple query engine for generating the results;\n",
    "2. Using `MarkdownElementNodeParser` for parsing the `LlamaParse` output Markdown results and building recursive retriever query engine for generation."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Started parsing the file under job_id cac11eca-71db-4dab-b72b-c67d31e551f3\n"
     ]
    }
   ],
   "source": [
    "from llama_parse import LlamaParse\n",
    "\n",
    "documents = LlamaParse(result_type=\"markdown\").load_data(\"./apple_2021_10k.pdf\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from copy import deepcopy\n",
    "from llama_index.core.schema import TextNode\n",
    "from llama_index.core import VectorStoreIndex\n",
    "\n",
    "\n",
    "def get_page_nodes(docs, separator=\"\\n---\\n\"):\n",
    "    \"\"\"Split each document into page node, by separator.\"\"\"\n",
    "    nodes = []\n",
    "    for doc in docs:\n",
    "        doc_chunks = doc.text.split(separator)\n",
    "        for doc_chunk in doc_chunks:\n",
    "            node = TextNode(\n",
    "                text=doc_chunk,\n",
    "                metadata=deepcopy(doc.metadata),\n",
    "            )\n",
    "            nodes.append(node)\n",
    "\n",
    "    return nodes"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "page_nodes = get_page_nodes(documents)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from llama_index.core.node_parser import MarkdownElementNodeParser\n",
    "\n",
    "node_parser = MarkdownElementNodeParser(\n",
    "    llm=OpenAI(model=\"gpt-3.5-turbo-0125\"), num_workers=8\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "nodes = node_parser.get_nodes_from_documents(documents)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "base_nodes, objects = node_parser.get_nodes_and_objects(nodes)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "\"This table provides information about a company's state of incorporation or organization and its corresponding I.R.S. Employer Identification Number.,\\nwith the following table title:\\nCompany Incorporation Information,\\nwith the following columns:\\n- California: None\\n- 94-2404110: None\\n\""
      ]
     },
     "execution_count": null,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "objects[0].get_content()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# dump both indexed tables and page text into the vector index\n",
    "recursive_index = VectorStoreIndex(nodes=base_nodes + objects + page_nodes)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "# Apple Inc.\n",
      "\n",
      "**CONSOLIDATED STATEMENTS OF OPERATIONS (In millions, except number of shares which are reflected in thousands and per share amounts)**\n",
      "| |September 25, 2021|September 26, 2020|September 28, 2019|\n",
      "|---|---|---|---|\n",
      "|Net sales:|$297,392|$220,747|$213,883|\n",
      "|Products| | | |\n",
      "|Services|$68,425|$53,768|$46,291|\n",
      "|Total net sales|$365,817|$274,515|$260,174|\n",
      "|Cost of sales:| | | |\n",
      "|Products|$192,266|$151,286|$144,996|\n",
      "|Services|$20,715|$18,273|$16,786|\n",
      "|Total cost of sales|$212,981|$169,559|$161,782|\n",
      "|Gross margin|$152,836|$104,956|$98,392|\n",
      "|Operating expenses:| | | |\n",
      "|Research and development|$21,914|$18,752|$16,217|\n",
      "|Selling, general and administrative|$21,973|$19,916|$18,245|\n",
      "|Total operating expenses|$43,887|$38,668|$34,462|\n",
      "|Operating income|$108,949|$66,288|$63,930|\n",
      "|Other income/(expense), net|$258|$803|$1,807|\n",
      "|Income before provision for income taxes|$109,207|$67,091|$65,737|\n",
      "|Provision for income taxes|$14,527|$9,680|$10,481|\n",
      "|Net income|$94,680|$57,411|$55,256|\n",
      "|Earnings per share:| | | |\n",
      "|Basic|$5.67|$3.31|$2.99|\n",
      "|Diluted|$5.61|$3.28|$2.97|\n",
      "|Shares used in computing earnings per share:| | | |\n",
      "|Basic|16,701,272|17,352,119|18,471,336|\n",
      "|Diluted|16,864,919|17,528,214|18,595,651|\n",
      "\n",
      "See accompanying Notes to Consolidated Financial Statements.\n"
     ]
    }
   ],
   "source": [
    "print(page_nodes[31].get_content())"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from llama_index.postprocessor.flag_embedding_reranker import FlagEmbeddingReranker\n",
    "\n",
    "reranker = FlagEmbeddingReranker(\n",
    "    top_n=5,\n",
    "    model=\"BAAI/bge-reranker-large\",\n",
    ")\n",
    "\n",
    "recursive_query_engine = recursive_index.as_query_engine(\n",
    "    similarity_top_k=5, node_postprocessors=[reranker], verbose=True\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "233\n"
     ]
    }
   ],
   "source": [
    "print(len(nodes))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Setup Baseline\n",
    "\n",
    "For comparison, we setup a naive RAG pipeline with default parsing and standard chunking, indexing, retrieval."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from llama_index.core import SimpleDirectoryReader\n",
    "\n",
    "reader = SimpleDirectoryReader(input_files=[\"apple_2021_10k.pdf\"])\n",
    "base_docs = reader.load_data()\n",
    "raw_index = VectorStoreIndex.from_documents(base_docs)\n",
    "raw_query_engine = raw_index.as_query_engine(\n",
    "    similarity_top_k=5, node_postprocessors=[reranker]\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Using `new LlamaParse` as pdf data parsing methods and retrieve tables with two different methods\n",
    "we compare base query engine vs recursive query engine with tables"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Table Query Task: Queries for Table Question Answering"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      "***********Basic Query Engine***********\n",
      "The purchases of marketable securities in 2020 amounted to $163.4 billion.\n",
      "\u001b[1;3;38;2;11;159;203mRetrieval entering 59368b87-e602-4bd1-88a7-7526fd6ab83f: TextNode\n",
      "\u001b[0m\u001b[1;3;38;2;237;90;200mRetrieving from object TextNode with query Purchases of marketable securities in 2020\n",
      "\u001b[0m\u001b[1;3;38;2;11;159;203mRetrieval entering dfd97f47-eb4d-4bab-8a22-9bbbc0096a4b: TextNode\n",
      "\u001b[0m\u001b[1;3;38;2;237;90;200mRetrieving from object TextNode with query Purchases of marketable securities in 2020\n",
      "\u001b[0m\n",
      "***********New LlamaParse+ Recursive Retriever Query Engine***********\n",
      "$114,938\n"
     ]
    }
   ],
   "source": [
    "query = \"Purchases of marketable securities in 2020\"\n",
    "\n",
    "response_1 = raw_query_engine.query(query)\n",
    "print(\"\\n***********Basic Query Engine***********\")\n",
    "print(response_1)\n",
    "\n",
    "response_2 = recursive_query_engine.query(query)\n",
    "print(\"\\n***********New LlamaParse+ Recursive Retriever Query Engine***********\")\n",
    "print(response_2)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "This table provides information on hedged assets and liabilities for the years 2021 and 2020, including current and non-current marketable securities and term debt.,\n",
      "with the following table title:\n",
      "Hedged Assets and Liabilities Summary,\n",
      "with the following columns:\n",
      "- 2021: None\n",
      "- 2020: None\n",
      "\n",
      "| |2021|2020|\n",
      "|---|---|---|\n",
      "|Hedged assets/(liabilities):| | |\n",
      "|Current and non-current marketable securities|$15,954|$16,270|\n",
      "|Current and non-current term debt|$(17,857)|$(21,033)|\n",
      "\n"
     ]
    }
   ],
   "source": [
    "print(response_2.source_nodes[2].get_content())"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      "***********Basic Query Engine***********\n",
      "0.03%, 0.75%, 1.43%\n",
      "\u001b[1;3;38;2;11;159;203mRetrieval entering a5afa785-217f-4e72-87cf-15da11632ec0: TextNode\n",
      "\u001b[0m\u001b[1;3;38;2;237;90;200mRetrieving from object TextNode with query effective interest rates of all debt issuances in 2021\n",
      "\u001b[0m\n",
      "***********New LlamaParse+ Recursive Retriever Query Engine***********\n",
      "0.48% – 0.63%, 0.03% – 4.78%, 0.75% – 2.81%, 1.43% – 2.86%\n"
     ]
    }
   ],
   "source": [
    "query = \"effective interest rates of all debt issuances in 2021\"\n",
    "\n",
    "response_1 = raw_query_engine.query(query)\n",
    "print(\"\\n***********Basic Query Engine***********\")\n",
    "print(response_1)\n",
    "\n",
    "response_2 = recursive_query_engine.query(query)\n",
    "print(\"\\n***********New LlamaParse+ Recursive Retriever Query Engine***********\")\n",
    "print(response_2)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Term Debt\n",
      "As of September 25, 2021 , the Company had outstanding floating- and fixed-rate notes with varying maturities for an aggregate \n",
      "principal amount of $118.1 billion  (collectively the “Notes”). The Notes are senior unsecured obligations and interest is payable in \n",
      "arrears. The following table provides a summary of the Company’s term debt as of September 25, 2021  and September 26, \n",
      "2020 :\n",
      "Maturities\n",
      "(calendar year)2021 2020\n",
      "Amount\n",
      "(in millions)Effective\n",
      "Interest RateAmount\n",
      "(in millions)Effective\n",
      "Interest Rate\n",
      "2013 – 2020 debt issuances:\n",
      "Floating-rate notes  2022 $ 1,750 0.48%  – 0.63% $ 2,250 0.60%  – 1.39%\n",
      "Fixed-rate 0.000%  – 4.650%  notes 2022  – 2060  95,813 0.03%  – 4.78%  103,828 0.03%  – 4.78%\n",
      "Second quarter 2021 debt issuance:\n",
      "Fixed-rate 0.700%  – 2.800%  notes 2026  – 2061  14,000 0.75%  – 2.81%  —  — %\n",
      "Fourth quarter 2021 debt issuance:\n",
      "Fixed-rate 1.400%  – 2.850%  notes 2028  – 2061  6,500 1.43%  – 2.86%  —  — %\n",
      "Total term debt  118,063  106,078 \n",
      "Unamortized premium/(discount) and issuance \n",
      "costs, net  (380)  (314) \n",
      "Hedge accounting fair value adjustments  1,036  1,676 \n",
      "Less: Current portion of term debt  (9,613)  (8,773) \n",
      "Total non-current portion of term debt $ 109,106 $ 98,667 \n",
      "To manage interest rate risk on certain of its U.S. dollar–denominated fixed- or floating-rate notes, the Company has entered into \n",
      "interest rate swaps to effectively convert the fixed interest rates to floating interest rates or the floating interest rates to fixed \n",
      "interest rates on a portion of these notes. Additionally, to manage foreign currency risk on certain of its foreign currency–\n",
      "denominated notes, the Company has entered into foreign currency swaps to effectively convert these notes to U.S. dollar–\n",
      "denominated notes.\n",
      "The effective interest rates for the Notes include the interest on the Notes, amortization of the discount or premium and, if \n",
      "applicable, adjustments related to hedging. The Company recogni zed $2.6 billion , $2.8 billion  and $3.2 billion  of interest expense \n",
      "on its term debt for 2021 , 2020  and 2019 , respectively.\n",
      "The future principal payments for the Company’s Notes as of September 25, 2021 , are as follows (in millions):\n",
      "2022 $ 9,583 \n",
      "2023  11,391 \n",
      "2024  10,202 \n",
      "2025  10,914 \n",
      "2026  11,408 \n",
      "Thereafter  64,565 \n",
      "Total term debt $ 118,063 \n",
      "As of September 25, 2021  and September 26, 2020 , the fair value of the Company’s Notes, based on Level 2 inputs, was $125.3 \n",
      "billion  and $117.1 billion , respectively.\n",
      "Apple Inc. | 2021  Form 10-K | 45\n"
     ]
    }
   ],
   "source": [
    "print(response_1.source_nodes[0].get_content())"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      "***********Basic Query Engine***********\n",
      "The U.S. Tax Cuts and Jobs Act of 2017 had an impact on income taxes in 2020, as evidenced by a decrease in the provision for income taxes compared to the prior year.\n",
      "\u001b[1;3;38;2;11;159;203mRetrieval entering b9416f35-ebf1-45d6-9a29-b59e435ab42d: TextNode\n",
      "\u001b[0m\u001b[1;3;38;2;237;90;200mRetrieving from object TextNode with query Impacts of the U.S. Tax Cuts and Jobs Act of 2017 on income taxes in 2020\n",
      "\u001b[0m\u001b[1;3;38;2;11;159;203mRetrieval entering 8d8d5733-ff30-4535-9376-7f761b5900ea: TextNode\n",
      "\u001b[0m\u001b[1;3;38;2;237;90;200mRetrieving from object TextNode with query Impacts of the U.S. Tax Cuts and Jobs Act of 2017 on income taxes in 2020\n",
      "\u001b[0m\u001b[1;3;38;2;11;159;203mRetrieval entering 82f301e5-199a-4aa2-bbdf-ef97898c0326: TextNode\n",
      "\u001b[0m\u001b[1;3;38;2;237;90;200mRetrieving from object TextNode with query Impacts of the U.S. Tax Cuts and Jobs Act of 2017 on income taxes in 2020\n",
      "\u001b[0m\u001b[1;3;38;2;11;159;203mRetrieval entering 86f666b4-254b-487f-9870-8ee09aef07a9: TextNode\n",
      "\u001b[0m\u001b[1;3;38;2;237;90;200mRetrieving from object TextNode with query Impacts of the U.S. Tax Cuts and Jobs Act of 2017 on income taxes in 2020\n",
      "\u001b[0m\n",
      "***********New LlamaParse+ Recursive Retriever Query Engine***********\n",
      "The U.S. Tax Cuts and Jobs Act of 2017 had a negative impact on income taxes in 2020.\n"
     ]
    }
   ],
   "source": [
    "query = \"Impacts of the U.S. Tax Cuts and Jobs Act of 2017 on income taxes in 2020\"\n",
    "\n",
    "response_1 = raw_query_engine.query(query)\n",
    "print(\"\\n***********Basic Query Engine***********\")\n",
    "print(response_1)\n",
    "\n",
    "response_2 = recursive_query_engine.query(query)\n",
    "print(\"\\n***********New LlamaParse+ Recursive Retriever Query Engine***********\")\n",
    "print(response_2)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Other Income/(Expense), Net\n",
      "The following table shows the detail of OI&E for 2021 , 2020  and 2019  (in millions):\n",
      "2021 2020 2019\n",
      "Interest and dividend income $ 2,843 $ 3,763 $ 4,961 \n",
      "Interest expense  (2,645)  (2,873)  (3,576) \n",
      "Other income/(expense), net  60  (87)  422 \n",
      "Total other income/(expense), net $ 258 $ 803 $ 1,807 \n",
      "Note 5 – Income Taxe s\n",
      "Provision for Income Taxes and Effective  Tax Rat e\n",
      "The provision for income taxes for 2021 , 2020  and 2019 , consisted of the following (in millions):\n",
      "2021 2020 2019\n",
      "Federal:\n",
      "Current $ 8,257 $ 6,306 $ 6,384 \n",
      "Deferred  (7,176)  (3,619)  (2,939) \n",
      "Total  1,081  2,687  3,445 \n",
      "State:\n",
      "Current  1,620  455  475 \n",
      "Deferred  (338)  21  (67) \n",
      "Total  1,282  476  408 \n",
      "Foreign:\n",
      "Current  9,424  3,134  3,962 \n",
      "Deferred  2,740  3,383  2,666 \n",
      "Total  12,164  6,517  6,628 \n",
      "Provision for income taxes $ 14,527 $ 9,680 $ 10,481 \n",
      "The foreign provision for income taxes is based on foreign pretax earnings of $68.7 billion , $38.1 billion  and $44.3 billion  in 2021 , \n",
      "2020  and 2019 , respectively.\n",
      "A reconciliation of the provision for income taxes, with the amount computed by applying the statutory federal income tax  rate \n",
      "(21% in 2021 , 2020  and 2019 ) to income before provision for income taxes for 2021 , 2020  and 2019 , is as follows (dollars in \n",
      "millions):\n",
      "2021 2020 2019\n",
      "Computed expected tax $ 22,933 $ 14,089 $ 13,805 \n",
      "State taxes, net of federal effect  1,151  423  423 \n",
      "Impacts of the U.S. Tax Cuts and Jobs Act of 2017  —  (582)  — \n",
      "Earnings of foreign subsidiaries  (4,715)  (2,534)  (2,625) \n",
      "Foreign-derived intangible income deduction  (1,372)  (169)  (149) \n",
      "Research and development credit, net  (1,033)  (728)  (548) \n",
      "Excess tax benefits from equity awards  (2,137)  (930)  (639) \n",
      "Other  (300)  111  214 \n",
      "Provision for income taxes $ 14,527 $ 9,680 $ 10,481 \n",
      "Effective tax rate  13.3%  14.4%  15.9% \n",
      "Apple Inc. | 2021  Form 10-K | 41\n"
     ]
    }
   ],
   "source": [
    "print(response_1.source_nodes[0].get_content())"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      "***********Basic Query Engine***********\n",
      "$3,619 million in 2019, $7,176 million in 2020, and $1,081 million in 2021\n",
      "\u001b[1;3;38;2;11;159;203mRetrieval entering 12b1355a-f9e6-4b08-a19a-3ffc00dc5b9f: TextNode\n",
      "\u001b[0m\u001b[1;3;38;2;237;90;200mRetrieving from object TextNode with query federal deferred tax in 2019-2021\n",
      "\u001b[0m\u001b[1;3;38;2;11;159;203mRetrieval entering 82f301e5-199a-4aa2-bbdf-ef97898c0326: TextNode\n",
      "\u001b[0m\u001b[1;3;38;2;237;90;200mRetrieving from object TextNode with query federal deferred tax in 2019-2021\n",
      "\u001b[0m\u001b[1;3;38;2;11;159;203mRetrieval entering 8d8d5733-ff30-4535-9376-7f761b5900ea: TextNode\n",
      "\u001b[0m\u001b[1;3;38;2;237;90;200mRetrieving from object TextNode with query federal deferred tax in 2019-2021\n",
      "\u001b[0m\n",
      "***********New LlamaParse+ Recursive Retriever Query Engine***********\n",
      "$2,939, $3,619, $7,176\n"
     ]
    }
   ],
   "source": [
    "query = \"federal deferred tax in 2019-2021\"\n",
    "\n",
    "response_1 = raw_query_engine.query(query)\n",
    "print(\"\\n***********Basic Query Engine***********\")\n",
    "print(response_1)\n",
    "\n",
    "response_2 = recursive_query_engine.query(query)\n",
    "print(\"\\n***********New LlamaParse+ Recursive Retriever Query Engine***********\")\n",
    "print(response_2)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      "***********Basic Query Engine***********\n",
      "State deferred income tax for 2019: $454 million\n",
      "State deferred income tax for 2020: $21 million\n",
      "State deferred income tax for 2021: -$338 million\n",
      "\u001b[1;3;38;2;11;159;203mRetrieval entering 12b1355a-f9e6-4b08-a19a-3ffc00dc5b9f: TextNode\n",
      "\u001b[0m\u001b[1;3;38;2;237;90;200mRetrieving from object TextNode with query give me the deferred state income tax in 2019-2021 (include +/-)\n",
      "\u001b[0m\u001b[1;3;38;2;11;159;203mRetrieval entering 8d8d5733-ff30-4535-9376-7f761b5900ea: TextNode\n",
      "\u001b[0m\u001b[1;3;38;2;237;90;200mRetrieving from object TextNode with query give me the deferred state income tax in 2019-2021 (include +/-)\n",
      "\u001b[0m\n",
      "***********New LlamaParse+ Recursive Retriever Query Engine***********\n",
      "Deferred state income tax for the years 2019-2021:\n",
      "- 2019: ($67) million\n",
      "- 2020: $21 million\n",
      "- 2021: ($338) million\n"
     ]
    }
   ],
   "source": [
    "query = \"give me the deferred state income tax in 2019-2021 (include +/-)\"\n",
    "\n",
    "response_1 = raw_query_engine.query(query)\n",
    "print(\"\\n***********Basic Query Engine***********\")\n",
    "print(response_1)\n",
    "\n",
    "response_2 = recursive_query_engine.query(query)\n",
    "print(\"\\n***********New LlamaParse+ Recursive Retriever Query Engine***********\")\n",
    "print(response_2)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Summary of income tax provisions for Federal, State, and Foreign entities over the years 2019, 2020, and 2021.,\n",
      "with the following table title:\n",
      "Income Tax Provisions by Entity and Year,\n",
      "with the following columns:\n",
      "- Entity: The type of entity (Federal, State, Foreign)\n",
      "- 2019: Income tax provisions for the year 2019\n",
      "- 2020: Income tax provisions for the year 2020\n",
      "- 2021: Income tax provisions for the year 2021\n",
      "\n",
      "| |2021|2020|2019|\n",
      "|---|---|---|---|\n",
      "|Federal:| | | |\n",
      "|Current|$8,257|$6,306|$6,384|\n",
      "|Deferred|(7,176)|(3,619)|(2,939)|\n",
      "|Total|1,081|2,687|3,445|\n",
      "|State:| | | |\n",
      "|Current|1,620|455|475|\n",
      "|Deferred|(338)|21|(67)|\n",
      "|Total|1,282|476|408|\n",
      "|Foreign:| | | |\n",
      "|Current|9,424|3,134|3,962|\n",
      "|Deferred|2,740|3,383|2,666|\n",
      "|Total|12,164|6,517|6,628|\n",
      "|Provision for income taxes|$14,527|$9,680|$10,481|\n",
      "\n"
     ]
    }
   ],
   "source": [
    "print(response_2.source_nodes[0].get_content())"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      "***********Basic Query Engine***********\n",
      "$1,620 million in 2019, $455 million in 2020, $475 million in 2021\n",
      "\u001b[1;3;38;2;11;159;203mRetrieval entering 82f301e5-199a-4aa2-bbdf-ef97898c0326: TextNode\n",
      "\u001b[0m\u001b[1;3;38;2;237;90;200mRetrieving from object TextNode with query current state taxes per year in 2019-2021 (include +/-)\n",
      "\u001b[0m\u001b[1;3;38;2;11;159;203mRetrieval entering 8d8d5733-ff30-4535-9376-7f761b5900ea: TextNode\n",
      "\u001b[0m\u001b[1;3;38;2;237;90;200mRetrieving from object TextNode with query current state taxes per year in 2019-2021 (include +/-)\n",
      "\u001b[0m\u001b[1;3;38;2;11;159;203mRetrieval entering b9416f35-ebf1-45d6-9a29-b59e435ab42d: TextNode\n",
      "\u001b[0m\u001b[1;3;38;2;237;90;200mRetrieving from object TextNode with query current state taxes per year in 2019-2021 (include +/-)\n",
      "\u001b[0m\u001b[1;3;38;2;11;159;203mRetrieval entering a029e464-575f-4dd6-afad-7cc0bbc5dbf9: TextNode\n",
      "\u001b[0m\u001b[1;3;38;2;237;90;200mRetrieving from object TextNode with query current state taxes per year in 2019-2021 (include +/-)\n",
      "\u001b[0m\n",
      "***********New LlamaParse+ Recursive Retriever Query Engine***********\n",
      "$475 in 2019, $455 in 2020, $1,620 in 2021.\n"
     ]
    }
   ],
   "source": [
    "query = \"current state taxes per year in 2019-2021 (include +/-)\"\n",
    "\n",
    "response_1 = raw_query_engine.query(query)\n",
    "print(\"\\n***********Basic Query Engine***********\")\n",
    "print(response_1)\n",
    "\n",
    "response_2 = recursive_query_engine.query(query)\n",
    "print(\"\\n***********New LlamaParse+ Recursive Retriever Query Engine***********\")\n",
    "print(response_2)"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "llama_parse",
   "language": "python",
   "name": "llama_parse"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
